Research Question¶

How do funding disparities suggest an underrepresentation of Black and Hispanic youth in traditional U.S public schools in high-cost sports?

Problem Statement¶

Sports specialization involves year-round training and competition, and requires costly investments towards participation, travel, and equipment fees, which creates significant finanicial barriers for youth from lower socioeconomic backgrounds. Aside from this, public school funding disparities can limit access to appropriate facilities, personnel, or physical education, which could further hinder sports participation opportunities for youth in lower SES communities. These disparities can contribute to underrepresentation of Black or Hispanic youth in sports with high financial barriers -- hockey, gymnastics, tennis, etc., while sports such as track and field are less expensive, and therefore more accessible.

Potential Subtopics¶

  • Correlation between public school funding and facility quality
  • Connection between SES and physical activity/education

Data Definition¶

Public School Characteristics 2022-23

Last Updated: October 21, 2024

https://catalog.data.gov/dataset/public-school-characteristics-2022-23-451db

The National Center for Education Statistics (NCES) gathers demographic and geographic data about U.S public schools and factors such as enrollment and Title I status. Further information consists of the percentage of students with free or reduced lunch eligibility. By researching both this dataset and the YRBSS, researchers could analyze patterns between students or schools with a lower SES and the rates of physical activity rates.

Additional Datasets of Interest¶

Nutrition, Physical Activity, and Obesity - Youth Risk Behavior Surveillance System

Last Updated: February 4, 2025

https://catalog.data.gov/dataset/nutrition-physical-activity-and-obesity-youth-risk-behavior-surveillance-system

Conducted by the Centers for Disease Control and Prevention (CDC), the Youth Risk Behavior Surveillance System (YRBSS) monitors health behaviors in middle and high school students nationwide. It collects data regarding physical activity and nutrition, along with geographic and socioeconomic factors. By collecting this data, it could be used to further research on the impact socioeconomic factors have on health behaviors.

Data Collection¶

In [1]:
import numpy as np                
import pandas as pd              
import matplotlib.pyplot as plt   
import seaborn as sns               

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

import warnings
warnings.filterwarnings('ignore')

Read the Data¶

In [2]:
path = pd.read_csv('Public_School_Characteristics_2022-23.csv')
psChar_23 = pd.DataFrame(path)
In [3]:
psChar_23.head(7)
Out[3]:
X Y OBJECTID NCESSCH SURVYEAR STABR LEAID ST_LEAID LEA_NAME SCH_NAME LSTREET1 LSTREET2 LCITY LSTATE LZIP LZIP4 PHONE CHARTER_TEXT VIRTUAL GSLO GSHI SCHOOL_LEVEL STATUS SCHOOL_TYPE_TEXT SY_STATUS_TEXT ULOCALE NMCNTY TOTFRL FRELCH REDLCH DIRECTCERT PK KG G01 G02 G03 G04 G05 G06 G07 G08 G09 G10 G11 G12 G13 UG AE TOTMENROL TOTFENROL TOTAL MEMBER FTE STUTERATIO AMALM AMALF AM ASALM ASALF AS BLALM BLALF BL HPALM HPALF HP HIALM HIALF HI TRALM TRALF TR WHALM WHALF WH LATCOD LONCOD
0 -86.206200 34.26020 1 10000500870 2022-2023 AL 100005 AL-101 Albertville City Albertville Middle School 600 E Alabama Ave NaN Albertville AL 35950 (256)878-2341 No Not Virtual 07 08 Middle 1 Regular School Currently operational 32-Town: Distant Marshall County 697 654 43 587 NaN NaN NaN NaN NaN NaN NaN NaN 440.0 450.0 NaN NaN NaN NaN NaN NaN NaN 459.0 431.0 890.0 890.0 45.000000 19.78 4.0 1.0 5.0 4.0 2.0 6.0 15.0 14.0 29.0 0.0 1.0 1.0 251.0 251.0 502.0 17.0 15.0 32.0 168.0 147.0 315.0 34.26020 -86.206200
1 -86.204900 34.26220 2 10000500871 2022-2023 AL 100005 AL-101 Albertville City Albertville High School 402 E McCord Ave NaN Albertville AL 35950 2322 (256)894-5000 No Not Virtual 09 12 High 1 Regular School Currently operational 32-Town: Distant Marshall County 1254 1178 76 1059 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 493.0 442.0 390.0 387.0 NaN NaN NaN 868.0 844.0 1712.0 1712.0 85.199997 20.09 0.0 2.0 2.0 4.0 5.0 9.0 23.0 34.0 57.0 0.0 0.0 0.0 490.0 468.0 958.0 26.0 19.0 45.0 325.0 316.0 641.0 34.26220 -86.204900
2 -86.220100 34.27330 3 10000500879 2022-2023 AL 100005 AL-101 Albertville City Albertville Intermediate School 901 W McKinney Ave NaN Albertville AL 35950 1300 (256)878-7698 No Not Virtual 05 06 Middle 1 Regular School Currently operational 32-Town: Distant Marshall County 718 665 53 570 NaN NaN NaN NaN NaN NaN 412.0 462.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 451.0 423.0 874.0 874.0 43.000000 20.33 1.0 4.0 5.0 4.0 0.0 4.0 22.0 28.0 50.0 0.0 0.0 0.0 263.0 241.0 504.0 7.0 6.0 13.0 154.0 144.0 298.0 34.27330 -86.220100
3 -86.221806 34.25270 4 10000500889 2022-2023 AL 100005 AL-101 Albertville City Albertville Elementary School 145 West End Drive NaN Albertville AL 35950 (256)894-4822 No Not Virtual 03 04 Elementary 1 Regular School Currently operational 32-Town: Distant Marshall County 723 680 43 583 NaN NaN NaN NaN 430.0 444.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 463.0 411.0 874.0 874.0 43.000000 20.33 0.0 4.0 4.0 1.0 3.0 4.0 22.0 16.0 38.0 0.0 0.0 0.0 261.0 236.0 497.0 11.0 16.0 27.0 168.0 136.0 304.0 34.25270 -86.221806
4 -86.193300 34.28980 5 10000501616 2022-2023 AL 100005 AL-101 Albertville City Albertville Kindergarten and PreK 257 Country Club Rd NaN Albertville AL 35951 3927 (256)878-7922 No Not Virtual PK KG Elementary 1 Regular School Currently operational 32-Town: Distant Marshall County 392 367 25 240 133.0 473.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 304.0 302.0 606.0 606.0 26.000000 23.31 1.0 3.0 4.0 2.0 0.0 2.0 26.0 23.0 49.0 0.0 0.0 0.0 167.0 152.0 319.0 4.0 4.0 8.0 104.0 120.0 224.0 34.28980 -86.193300
5 -86.221800 34.25330 6 10000502150 2022-2023 AL 100005 AL-101 Albertville City Albertville Primary School 1100 Horton Rd NaN Albertville AL 35950 2532 (256)878-6611 No Not Virtual 01 02 Elementary 1 Regular School Currently operational 32-Town: Distant Marshall County 779 726 53 617 0.0 NaN 427.0 517.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 498.0 446.0 944.0 944.0 61.000000 15.48 9.0 1.0 10.0 3.0 0.0 3.0 24.0 21.0 45.0 0.0 1.0 1.0 290.0 256.0 546.0 9.0 10.0 19.0 163.0 157.0 320.0 34.25330 -86.221800
6 -86.254153 34.53375 7 10000600193 2022-2023 AL 100006 AL-048 Marshall County Kate Duncan Smith DAR Middle 6077 Main St NaN Grant AL 35747 (256)728-5950 No Not Virtual 05 08 Middle 1 Regular School Currently operational 42-Rural: Distant Marshall County 151 123 28 194 NaN NaN NaN NaN NaN NaN 95.0 97.0 86.0 86.0 NaN NaN NaN NaN NaN NaN NaN 192.0 172.0 364.0 364.0 22.030001 16.52 1.0 3.0 4.0 0.0 0.0 0.0 2.0 0.0 2.0 0.0 0.0 0.0 6.0 8.0 14.0 5.0 9.0 14.0 178.0 152.0 330.0 34.53375 -86.254153
In [4]:
psChar_23.tail(7)
Out[4]:
X Y OBJECTID NCESSCH SURVYEAR STABR LEAID ST_LEAID LEA_NAME SCH_NAME LSTREET1 LSTREET2 LCITY LSTATE LZIP LZIP4 PHONE CHARTER_TEXT VIRTUAL GSLO GSHI SCHOOL_LEVEL STATUS SCHOOL_TYPE_TEXT SY_STATUS_TEXT ULOCALE NMCNTY TOTFRL FRELCH REDLCH DIRECTCERT PK KG G01 G02 G03 G04 G05 G06 G07 G08 G09 G10 G11 G12 G13 UG AE TOTMENROL TOTFENROL TOTAL MEMBER FTE STUTERATIO AMALM AMALF AM ASALM ASALF AS BLALM BLALF BL HPALM HPALF HP HIALM HIALF HI TRALM TRALF TR WHALM WHALF WH LATCOD LONCOD
101383 -64.932456 18.352146 101384 780003000020 2022-2023 VI 7800030 VI-001 Saint Thomas - Saint John School District JOSEPH SIBILLY ELEMENTARY SCHOOL 14 15 16 ESTATE ELIZABETH NaN Saint Thomas VI 802 (340)774-7001 N Not Virtual PK 06 Elementary 1 Regular School Currently operational 33-Town: Remote St. Thomas Island 228 228 0 -1 19.0 25.0 25.0 25.0 31.0 34.0 34.0 38.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 121.0 110.0 231.0 231.0 16.0 14.44 0.0 0.0 0.0 2.0 2.0 4.0 99.0 93.0 192.0 0.0 0.0 0.0 8.0 5.0 13.0 2.0 1.0 3.0 10.0 9.0 19.0 18.352146 -64.932456
101384 -64.793916 18.330464 101385 780003000022 2022-2023 VI 7800030 VI-001 Saint Thomas - Saint John School District JULIUS E SPRAUVE 14 18 ESTATE ENIGHED NaN Saint John VI 831 (340)776-6336 N Not Virtual PK 08 Elementary 1 Regular School Currently operational 33-Town: Remote St. John Island 199 199 0 -1 8.0 21.0 16.0 21.0 14.0 24.0 20.0 26.0 27.0 25.0 NaN NaN NaN NaN NaN NaN NaN 103.0 99.0 202.0 202.0 20.0 10.10 1.0 0.0 1.0 0.0 0.0 0.0 79.0 68.0 147.0 0.0 0.0 0.0 22.0 29.0 51.0 0.0 0.0 0.0 1.0 2.0 3.0 18.330464 -64.793916
101385 -64.917602 18.341950 101386 780003000024 2022-2023 VI 7800030 VI-001 Saint Thomas - Saint John School District LOCKHART ELEMENTARY SCHOOL 41 ESTATE THOMAS NaN Saint Thomas VI 802 (340)775-0820 N Not Virtual KG 03 Elementary 1 Regular School Currently operational 33-Town: Remote St. Thomas Island 295 295 0 -1 NaN 77.0 75.0 69.0 77.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 171.0 127.0 298.0 298.0 18.0 16.56 0.0 0.0 0.0 4.0 3.0 7.0 132.0 92.0 224.0 0.0 0.0 0.0 33.0 30.0 63.0 1.0 2.0 3.0 1.0 0.0 1.0 18.341950 -64.917602
101386 -64.952483 18.338742 101387 780003000026 2022-2023 VI 7800030 VI-001 Saint Thomas - Saint John School District ULLA F MULLER ELEMENTARY SCHOOL 7B ESTATE CONTANT NaN Saint Thomas VI 802 (340)774-0059 N Not Virtual KG 06 Elementary 1 Regular School Currently operational 33-Town: Remote St. Thomas Island 417 417 0 -1 NaN 52.0 53.0 51.0 47.0 70.0 79.0 68.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 200.0 220.0 420.0 420.0 28.0 15.00 0.0 2.0 2.0 2.0 4.0 6.0 167.0 182.0 349.0 0.0 0.0 0.0 27.0 27.0 54.0 2.0 0.0 2.0 2.0 5.0 7.0 18.338742 -64.952483
101387 -64.899024 18.354782 101388 780003000027 2022-2023 VI 7800030 VI-001 Saint Thomas - Saint John School District YVONNE BOWSKY ELEMENTARY SCHOOL 15B and 16 ESTATE MANDAHL NaN Saint Thomas VI 802 (340)775-3220 N Not Virtual PK 05 Elementary 1 Regular School Currently operational 33-Town: Remote St. Thomas Island 425 425 0 -1 22.0 62.0 67.0 66.0 75.0 68.0 68.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 252.0 176.0 428.0 428.0 34.0 12.59 1.0 1.0 2.0 5.0 4.0 9.0 201.0 144.0 345.0 0.0 0.0 0.0 37.0 22.0 59.0 0.0 1.0 1.0 8.0 4.0 12.0 18.354782 -64.899024
101388 -64.945940 18.336658 101389 780003000033 2022-2023 VI 7800030 VI-001 Saint Thomas - Saint John School District CANCRYN JUNIOR HIGH SCHOOL 1 CROWN BAY NaN Saint Thomas VI 804 (340)774-4540 N Not Virtual 04 08 Middle 1 Regular School Currently operational 33-Town: Remote St. Thomas Island 683 683 0 -1 NaN NaN NaN NaN NaN 77.0 119.0 96.0 189.0 205.0 NaN NaN NaN NaN NaN NaN NaN 361.0 325.0 686.0 686.0 62.0 11.06 0.0 0.0 0.0 2.0 2.0 4.0 279.0 250.0 529.0 0.0 0.0 0.0 74.0 62.0 136.0 0.0 1.0 1.0 6.0 10.0 16.0 18.336658 -64.945940
101389 -64.890311 18.318230 101390 780003000034 2022-2023 VI 7800030 VI-001 Saint Thomas - Saint John School District BERTHA BOSCHULTE JUNIOR HIGH 9 1 and 12A BOVONI NaN Saint Thomas VI 802 (340)775-4222 N Not Virtual 06 08 Middle 1 Regular School Currently operational 33-Town: Remote St. Thomas Island 504 504 0 -1 NaN NaN NaN NaN NaN NaN NaN 145.0 169.0 193.0 NaN NaN NaN NaN NaN NaN NaN 279.0 228.0 507.0 507.0 49.0 10.35 0.0 0.0 0.0 2.0 1.0 3.0 250.0 204.0 454.0 0.0 0.0 0.0 27.0 21.0 48.0 0.0 0.0 0.0 0.0 2.0 2.0 18.318230 -64.890311
In [5]:
psChar_23.shape
Out[5]:
(101390, 77)
  • The dataframe has 101,390 rows of data.
  • The dataframe has 77 columns or features.
  • There are 6,894,520 total datapoints observed in the dataset.
In [6]:
psChar_23.info(show_counts=True, verbose=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 101390 entries, 0 to 101389
Data columns (total 77 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   X                 101390 non-null  float64
 1   Y                 101390 non-null  float64
 2   OBJECTID          101390 non-null  int64  
 3   NCESSCH           101390 non-null  int64  
 4   SURVYEAR          101390 non-null  object 
 5   STABR             101390 non-null  object 
 6   LEAID             101390 non-null  int64  
 7   ST_LEAID          101390 non-null  object 
 8   LEA_NAME          101390 non-null  object 
 9   SCH_NAME          101390 non-null  object 
 10  LSTREET1          101389 non-null  object 
 11  LSTREET2          572 non-null     object 
 12  LCITY             101390 non-null  object 
 13  LSTATE            101390 non-null  object 
 14  LZIP              101390 non-null  int64  
 15  LZIP4             101390 non-null  object 
 16  PHONE             101390 non-null  object 
 17  CHARTER_TEXT      101390 non-null  object 
 18  VIRTUAL           101390 non-null  object 
 19  GSLO              101390 non-null  object 
 20  GSHI              101390 non-null  object 
 21  SCHOOL_LEVEL      101390 non-null  object 
 22  STATUS            101390 non-null  int64  
 23  SCHOOL_TYPE_TEXT  101390 non-null  object 
 24  SY_STATUS_TEXT    101390 non-null  object 
 25  ULOCALE           101390 non-null  object 
 26  NMCNTY            101390 non-null  object 
 27  TOTFRL            101390 non-null  int64  
 28  FRELCH            101390 non-null  int64  
 29  REDLCH            101390 non-null  int64  
 30  DIRECTCERT        101390 non-null  int64  
 31  PK                32392 non-null   float64
 32  KG                54061 non-null   float64
 33  G01               54412 non-null   float64
 34  G02               54469 non-null   float64
 35  G03               54459 non-null   float64
 36  G04               54258 non-null   float64
 37  G05               53014 non-null   float64
 38  G06               38023 non-null   float64
 39  G07               33224 non-null   float64
 40  G08               33492 non-null   float64
 41  G09               28101 non-null   float64
 42  G10               27889 non-null   float64
 43  G11               27888 non-null   float64
 44  G12               27816 non-null   float64
 45  G13               133 non-null     float64
 46  UG                7889 non-null    float64
 47  AE                183 non-null     float64
 48  TOTMENROL         98910 non-null   float64
 49  TOTFENROL         98910 non-null   float64
 50  TOTAL             99719 non-null   float64
 51  MEMBER            99719 non-null   float64
 52  FTE               97537 non-null   float64
 53  STUTERATIO        99576 non-null   float64
 54  AMALM             98809 non-null   float64
 55  AMALF             98811 non-null   float64
 56  AM                98857 non-null   float64
 57  ASALM             98898 non-null   float64
 58  ASALF             98900 non-null   float64
 59  AS                98906 non-null   float64
 60  BLALM             98896 non-null   float64
 61  BLALF             98893 non-null   float64
 62  BL                98903 non-null   float64
 63  HPALM             98782 non-null   float64
 64  HPALF             98783 non-null   float64
 65  HP                98829 non-null   float64
 66  HIALM             98909 non-null   float64
 67  HIALF             98910 non-null   float64
 68  HI                98910 non-null   float64
 69  TRALM             98903 non-null   float64
 70  TRALF             98905 non-null   float64
 71  TR                98906 non-null   float64
 72  WHALM             98909 non-null   float64
 73  WHALF             98909 non-null   float64
 74  WH                98910 non-null   float64
 75  LATCOD            101390 non-null  float64
 76  LONCOD            101390 non-null  float64
dtypes: float64(48), int64(9), object(20)
memory usage: 59.6+ MB
In [7]:
ps23Cols = psChar_23.columns
ps23Cols
Out[7]:
Index(['X', 'Y', 'OBJECTID', 'NCESSCH', 'SURVYEAR', 'STABR', 'LEAID',
       'ST_LEAID', 'LEA_NAME', 'SCH_NAME', 'LSTREET1', 'LSTREET2', 'LCITY',
       'LSTATE', 'LZIP', 'LZIP4', 'PHONE', 'CHARTER_TEXT', 'VIRTUAL', 'GSLO',
       'GSHI', 'SCHOOL_LEVEL', 'STATUS', 'SCHOOL_TYPE_TEXT', 'SY_STATUS_TEXT',
       'ULOCALE', 'NMCNTY', 'TOTFRL', 'FRELCH', 'REDLCH', 'DIRECTCERT', 'PK',
       'KG', 'G01', 'G02', 'G03', 'G04', 'G05', 'G06', 'G07', 'G08', 'G09',
       'G10', 'G11', 'G12', 'G13', 'UG', 'AE', 'TOTMENROL', 'TOTFENROL',
       'TOTAL', 'MEMBER', 'FTE', 'STUTERATIO', 'AMALM', 'AMALF', 'AM', 'ASALM',
       'ASALF', 'AS', 'BLALM', 'BLALF', 'BL', 'HPALM', 'HPALF', 'HP', 'HIALM',
       'HIALF', 'HI', 'TRALM', 'TRALF', 'TR', 'WHALM', 'WHALF', 'WH', 'LATCOD',
       'LONCOD'],
      dtype='object')
In [8]:
psChar_23 = psChar_23.rename(columns = {'OBJECTID':'ObjectID','NCESSCH':'NCESID','SURVYEAR':'SurveyYear', 
                                        'STABR':'StateABR','LEA_NAME':'LEAname','SCH_NAME':'SchoolName', 
                                        'LSTREET1':'Street1','LSTREET2':'Street2','LCITY':'City',
                                        'LSTATE':'State','LZIP':'Zip','LZIP4':'Zip4', 
                                        'PHONE':'Phone', 'CHARTER_TEXT':'Charter', 'VIRTUAL':'Virtual', 
                                        'GSLO':'LowestGrade','GSHI':'HighestGrade', 
                                        'SCHOOL_LEVEL':'SchoolLevel', 
                                        'STATUS':'Status', 'SCHOOL_TYPE_TEXT':'SchoolType', 
                                        'SY_STATUS_TEXT':'Status_Text',
                                        'ULOCALE':'Locale', 'NMCNTY':'County', 
                                        'TOTFRL':'TotalFreeLunch', 
                                        'FRELCH':'FreeLunch', 'REDLCH':'ReducedLunch', 
                                        'DIRECTCERT':'MealProgramCertified', 'PK':'PreK',
                                        'KG':'Kindergarten', 'G01':'Grade1', 'G02':'Grade2', 
                                        'G03':'Grade3', 'G04':'Grade4', 'G05':'Grade5', 
                                        'G06':'Grade6', 'G07':'Grade7', 'G08':'Grade8', 
                                        'G09':'Grade9','G10':'Grade10', 'G11':'Grade11', 
                                        'G12':'Grade12','G13':'Grade13', 'UG':'Ungraded', 
                                        'AE':'AdultEd', 'TOTMENROL':'TotMaleEnrollment', 
                                        'TOTFENROL':'TotFemaleEnrollment','TOTAL':'TotalEnrollment', 
                                        'MEMBER':'Member', 'FTE':'StaffFTE', 'STUTERATIO':'StudentTeacherRatio', 
                                        'AMALM':'AIANMale','AMALF':'AIANFem', 'AM':'AIANTotal', 
                                        'ASALM':'AsianMale', 'ASALF':'AsianFemale', 'AS':'AsianTotal', 
                                        'BLALM':'BlackMale','BLALF':'BlackFemale', 'BL':'BlackTotal', 
                                        'HPALM':'HPIMale', 'HPALF':'HPIFemale', 'HP':'HPITotal', 
                                        'HIALM':'HispanicMale','HIALF':'HispanicFemale', 'HI':'HispanicTotal', 
                                        'TRALM':'TRMale', 'TRALF':'TRFemale', 'TR':'TRTotal', 
                                        'WHALM':'WhiteMale','WHALF':'WhiteFemale', 'WH':'WhiteTotal', 
                                        'LATCOD':'Latitude','LONCOD':'Longitude'})

ps23Cols = psChar_23.columns
psChar_23.head()
Out[8]:
X Y ObjectID NCESID SurveyYear StateABR LEAID ST_LEAID LEAname SchoolName Street1 Street2 City State Zip Zip4 Phone Charter Virtual LowestGrade HighestGrade SchoolLevel Status SchoolType Status_Text Locale County TotalFreeLunch FreeLunch ReducedLunch MealProgramCertified PreK Kindergarten Grade1 Grade2 Grade3 Grade4 Grade5 Grade6 Grade7 Grade8 Grade9 Grade10 Grade11 Grade12 Grade13 Ungraded AdultEd TotMaleEnrollment TotFemaleEnrollment TotalEnrollment Member StaffFTE StudentTeacherRatio AIANMale AIANFem AIANTotal AsianMale AsianFemale AsianTotal BlackMale BlackFemale BlackTotal HPIMale HPIFemale HPITotal HispanicMale HispanicFemale HispanicTotal TRMale TRFemale TRTotal WhiteMale WhiteFemale WhiteTotal Latitude Longitude
0 -86.206200 34.2602 1 10000500870 2022-2023 AL 100005 AL-101 Albertville City Albertville Middle School 600 E Alabama Ave NaN Albertville AL 35950 (256)878-2341 No Not Virtual 07 08 Middle 1 Regular School Currently operational 32-Town: Distant Marshall County 697 654 43 587 NaN NaN NaN NaN NaN NaN NaN NaN 440.0 450.0 NaN NaN NaN NaN NaN NaN NaN 459.0 431.0 890.0 890.0 45.000000 19.78 4.0 1.0 5.0 4.0 2.0 6.0 15.0 14.0 29.0 0.0 1.0 1.0 251.0 251.0 502.0 17.0 15.0 32.0 168.0 147.0 315.0 34.2602 -86.206200
1 -86.204900 34.2622 2 10000500871 2022-2023 AL 100005 AL-101 Albertville City Albertville High School 402 E McCord Ave NaN Albertville AL 35950 2322 (256)894-5000 No Not Virtual 09 12 High 1 Regular School Currently operational 32-Town: Distant Marshall County 1254 1178 76 1059 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 493.0 442.0 390.0 387.0 NaN NaN NaN 868.0 844.0 1712.0 1712.0 85.199997 20.09 0.0 2.0 2.0 4.0 5.0 9.0 23.0 34.0 57.0 0.0 0.0 0.0 490.0 468.0 958.0 26.0 19.0 45.0 325.0 316.0 641.0 34.2622 -86.204900
2 -86.220100 34.2733 3 10000500879 2022-2023 AL 100005 AL-101 Albertville City Albertville Intermediate School 901 W McKinney Ave NaN Albertville AL 35950 1300 (256)878-7698 No Not Virtual 05 06 Middle 1 Regular School Currently operational 32-Town: Distant Marshall County 718 665 53 570 NaN NaN NaN NaN NaN NaN 412.0 462.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 451.0 423.0 874.0 874.0 43.000000 20.33 1.0 4.0 5.0 4.0 0.0 4.0 22.0 28.0 50.0 0.0 0.0 0.0 263.0 241.0 504.0 7.0 6.0 13.0 154.0 144.0 298.0 34.2733 -86.220100
3 -86.221806 34.2527 4 10000500889 2022-2023 AL 100005 AL-101 Albertville City Albertville Elementary School 145 West End Drive NaN Albertville AL 35950 (256)894-4822 No Not Virtual 03 04 Elementary 1 Regular School Currently operational 32-Town: Distant Marshall County 723 680 43 583 NaN NaN NaN NaN 430.0 444.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 463.0 411.0 874.0 874.0 43.000000 20.33 0.0 4.0 4.0 1.0 3.0 4.0 22.0 16.0 38.0 0.0 0.0 0.0 261.0 236.0 497.0 11.0 16.0 27.0 168.0 136.0 304.0 34.2527 -86.221806
4 -86.193300 34.2898 5 10000501616 2022-2023 AL 100005 AL-101 Albertville City Albertville Kindergarten and PreK 257 Country Club Rd NaN Albertville AL 35951 3927 (256)878-7922 No Not Virtual PK KG Elementary 1 Regular School Currently operational 32-Town: Distant Marshall County 392 367 25 240 133.0 473.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 304.0 302.0 606.0 606.0 26.000000 23.31 1.0 3.0 4.0 2.0 0.0 2.0 26.0 23.0 49.0 0.0 0.0 0.0 167.0 152.0 319.0 4.0 4.0 8.0 104.0 120.0 224.0 34.2898 -86.193300
In [9]:
psChar_23.isnull().sum()
Out[9]:
X                            0
Y                            0
ObjectID                     0
NCESID                       0
SurveyYear                   0
StateABR                     0
LEAID                        0
ST_LEAID                     0
LEAname                      0
SchoolName                   0
Street1                      1
Street2                 100818
City                         0
State                        0
Zip                          0
Zip4                         0
Phone                        0
Charter                      0
Virtual                      0
LowestGrade                  0
HighestGrade                 0
SchoolLevel                  0
Status                       0
SchoolType                   0
Status_Text                  0
Locale                       0
County                       0
TotalFreeLunch               0
FreeLunch                    0
ReducedLunch                 0
MealProgramCertified         0
PreK                     68998
Kindergarten             47329
Grade1                   46978
Grade2                   46921
Grade3                   46931
Grade4                   47132
Grade5                   48376
Grade6                   63367
Grade7                   68166
Grade8                   67898
Grade9                   73289
Grade10                  73501
Grade11                  73502
Grade12                  73574
Grade13                 101257
Ungraded                 93501
AdultEd                 101207
TotMaleEnrollment         2480
TotFemaleEnrollment       2480
TotalEnrollment           1671
Member                    1671
StaffFTE                  3853
StudentTeacherRatio       1814
AIANMale                  2581
AIANFem                   2579
AIANTotal                 2533
AsianMale                 2492
AsianFemale               2490
AsianTotal                2484
BlackMale                 2494
BlackFemale               2497
BlackTotal                2487
HPIMale                   2608
HPIFemale                 2607
HPITotal                  2561
HispanicMale              2481
HispanicFemale            2480
HispanicTotal             2480
TRMale                    2487
TRFemale                  2485
TRTotal                   2484
WhiteMale                 2481
WhiteFemale               2481
WhiteTotal                2480
Latitude                     0
Longitude                    0
dtype: int64
In [10]:
def missing(DataFrame):
    print('Percentage of missing values in the dataset:\n',
          round((DataFrame.isnull().sum() *100/len(DataFrame)), 2).sort_values(ascending=False))

missing(psChar_23)
Percentage of missing values in the dataset:
 Grade13                 99.87
AdultEd                 99.82
Street2                 99.44
Ungraded                92.22
Grade12                 72.57
Grade10                 72.49
Grade11                 72.49
Grade9                  72.28
PreK                    68.05
Grade7                  67.23
Grade8                  66.97
Grade6                  62.50
Grade5                  47.71
Kindergarten            46.68
Grade4                  46.49
Grade1                  46.33
Grade3                  46.29
Grade2                  46.28
StaffFTE                 3.80
HPIFemale                2.57
HPIMale                  2.57
AIANMale                 2.55
AIANFem                  2.54
HPITotal                 2.53
AIANTotal                2.50
AsianMale                2.46
BlackMale                2.46
AsianFemale              2.46
BlackFemale              2.46
WhiteFemale              2.45
WhiteTotal               2.45
TRTotal                  2.45
AsianTotal               2.45
BlackTotal               2.45
HispanicMale             2.45
HispanicFemale           2.45
HispanicTotal            2.45
WhiteMale                2.45
TotFemaleEnrollment      2.45
TRFemale                 2.45
TRMale                   2.45
TotMaleEnrollment        2.45
StudentTeacherRatio      1.79
TotalEnrollment          1.65
Member                   1.65
City                     0.00
Street1                  0.00
SchoolName               0.00
LEAname                  0.00
LEAID                    0.00
ST_LEAID                 0.00
StateABR                 0.00
SurveyYear               0.00
X                        0.00
NCESID                   0.00
ObjectID                 0.00
Y                        0.00
ReducedLunch             0.00
MealProgramCertified     0.00
TotalFreeLunch           0.00
FreeLunch                0.00
Zip                      0.00
State                    0.00
Zip4                     0.00
Phone                    0.00
Charter                  0.00
Virtual                  0.00
LowestGrade              0.00
HighestGrade             0.00
SchoolLevel              0.00
Status                   0.00
SchoolType               0.00
Status_Text              0.00
Locale                   0.00
County                   0.00
Latitude                 0.00
Longitude                0.00
dtype: float64

Observations¶

A total of eighteen columns have missing value percentages above forty-five percent. For the 'Grade' columns, this could be explained because this dataset includes schools at various education levels, meaning some schools might not offer certain grade levels. Furthermore, there are many missing values specifically for the columns regarding free/reduced lunch and the student to teacher ratio. As indicated in the description of this dataset online, these missing values are represented by a number of indicators: -1 indicates that data is missing, -2 or N indicates that data is not applicable, and -9 indicates that data did not meet NCES data quality standards. Given this information, I would drop the AdultEd and Grade13 columns, as this research is focused only on youth sports participation in traditional public schools. I would also drop columns 'Phone', 'LEAName', 'LEADID', 'ST_LEAID', 'SurveyYear', 'StaffFTE', 'Member', and 'NCESID', as they are not necessary for analysis. I also plan to remove the columns with negative values.

In [11]:
dropCols = ['AdultEd','Phone','LEAname','LEAID','ST_LEAID','SurveyYear','StaffFTE','Member','NCESID','Grade13']

psChar_23 = psChar_23.drop(columns=dropCols)
psChar_23 

psChar_23.isnull().sum()
Out[11]:
X                            0
Y                            0
ObjectID                     0
StateABR                     0
SchoolName                   0
Street1                      1
Street2                 100818
City                         0
State                        0
Zip                          0
Zip4                         0
Charter                      0
Virtual                      0
LowestGrade                  0
HighestGrade                 0
SchoolLevel                  0
Status                       0
SchoolType                   0
Status_Text                  0
Locale                       0
County                       0
TotalFreeLunch               0
FreeLunch                    0
ReducedLunch                 0
MealProgramCertified         0
PreK                     68998
Kindergarten             47329
Grade1                   46978
Grade2                   46921
Grade3                   46931
Grade4                   47132
Grade5                   48376
Grade6                   63367
Grade7                   68166
Grade8                   67898
Grade9                   73289
Grade10                  73501
Grade11                  73502
Grade12                  73574
Ungraded                 93501
TotMaleEnrollment         2480
TotFemaleEnrollment       2480
TotalEnrollment           1671
StudentTeacherRatio       1814
AIANMale                  2581
AIANFem                   2579
AIANTotal                 2533
AsianMale                 2492
AsianFemale               2490
AsianTotal                2484
BlackMale                 2494
BlackFemale               2497
BlackTotal                2487
HPIMale                   2608
HPIFemale                 2607
HPITotal                  2561
HispanicMale              2481
HispanicFemale            2480
HispanicTotal             2480
TRMale                    2487
TRFemale                  2485
TRTotal                   2484
WhiteMale                 2481
WhiteFemale               2481
WhiteTotal                2480
Latitude                     0
Longitude                    0
dtype: int64
In [12]:
psChar_23["Status_Text"].unique() #check to see if the schools are operational 

psChar_23 = psChar_23[psChar_23["Status_Text"].str.contains(
    "School to be operational within two years|School temporarily closed", na=False) ==False]
In [13]:
psChar_23["SchoolType"].unique() #check to see the types of schools listed in the dataset, only looking at traditional schools so we can cut the others out

psChar_23 = psChar_23[psChar_23["SchoolType"].str.contains(
    "Regular School", na=False)]
In [14]:
# filter out negative FRPL (free and reduced price lunch) values & student teacher ratios
negativeCols = ['ReducedLunch', 'MealProgramCertified','FreeLunch','StudentTeacherRatio']

psChar_23 = psChar_23[(psChar_23[negativeCols] >= 0).all(axis=1)]
In [15]:
psChar_23.shape
Out[15]:
(37392, 67)
In [16]:
psChar_23['Locale'].unique()
Out[16]:
array(['32-Town: Distant', '42-Rural: Distant', '41-Rural: Fringe',
       '13-City: Small', '21-Suburb: Large', '33-Town: Remote',
       '31-Town: Fringe', '23-Suburb: Small', '12-City: Mid-size',
       '43-Rural: Remote', '22-Suburb: Mid-size', '11-City: Large'],
      dtype=object)
In [17]:
Locale = {'42-Rural: Distant':'Rural',
            '41-Rural: Fringe':'Rural',
            '43-Rural: Remote':'Rural',
            '32-Town: Distant':'Town',
            '33-Town: Remote':'Town',
            '31-Town: Fringe':'Town',
            '13-City: Small':'City',
            '12-City: Mid-size':'City',
            '11-City: Large':'City',
            '21-Suburb: Large':'Suburb',
            '23-Suburb: Small':'Suburb',
            '22-Suburb: Mid-size':'Suburb'}

Locale
Out[17]:
{'42-Rural: Distant': 'Rural',
 '41-Rural: Fringe': 'Rural',
 '43-Rural: Remote': 'Rural',
 '32-Town: Distant': 'Town',
 '33-Town: Remote': 'Town',
 '31-Town: Fringe': 'Town',
 '13-City: Small': 'City',
 '12-City: Mid-size': 'City',
 '11-City: Large': 'City',
 '21-Suburb: Large': 'Suburb',
 '23-Suburb: Small': 'Suburb',
 '22-Suburb: Mid-size': 'Suburb'}
In [18]:
psChar_23['Locale'] = psChar_23['Locale'].map(Locale)

psChar_23['Locale'].unique()
Out[18]:
array(['Town', 'Rural', 'City', 'Suburb'], dtype=object)
In [19]:
psChar_23.describe()
Out[19]:
X Y ObjectID Zip Status TotalFreeLunch FreeLunch ReducedLunch MealProgramCertified PreK Kindergarten Grade1 Grade2 Grade3 Grade4 Grade5 Grade6 Grade7 Grade8 Grade9 Grade10 Grade11 Grade12 Ungraded TotMaleEnrollment TotFemaleEnrollment TotalEnrollment StudentTeacherRatio AIANMale AIANFem AIANTotal AsianMale AsianFemale AsianTotal BlackMale BlackFemale BlackTotal HPIMale HPIFemale HPITotal HispanicMale HispanicFemale HispanicTotal TRMale TRFemale TRTotal WhiteMale WhiteFemale WhiteTotal Latitude Longitude
count 37392.000000 37392.000000 37392.000000 37392.000000 37392.000000 37392.000000 37392.000000 37392.000000 37392.000000 11978.000000 22550.000000 22629.000000 22642.000000 22602.000000 22538.000000 22257.000000 14797.000000 11647.000000 11603.000000 8163.000000 8083.000000 8065.000000 8051.000000 1952.000000 36722.000000 36722.000000 37392.000000 37392.000000 36672.000000 36670.000000 36700.000000 36717.000000 36720.000000 36722.000000 36713.000000 36711.000000 36717.000000 36661.000000 36663.000000 36689.000000 36722.000000 36722.000000 36722.000000 36719.000000 36720.000000 36720.000000 36722.000000 36722.000000 36722.000000 37392.000000 37392.000000
mean -100.251468 37.290953 36287.616683 63446.899497 1.014549 329.526610 294.008478 35.518132 211.785756 32.782017 72.719335 71.254938 69.543636 71.746350 71.135815 72.845082 110.678448 141.237400 144.294062 223.755115 217.416924 200.690763 191.246429 5.592725 298.766843 283.438457 582.363982 17.143016 3.452471 3.328279 6.775395 18.924422 17.763154 36.684031 45.420178 43.891340 89.299398 1.804724 1.707362 3.509499 90.627880 86.666930 177.294810 16.256734 15.626416 31.882707 122.303170 114.477398 236.780568 37.290953 -100.251468
std 19.640040 6.016159 29092.351157 28541.959418 0.193091 302.519034 276.192414 57.319530 205.370496 38.554464 43.464381 41.309698 40.438018 41.754483 42.033706 46.943546 107.750815 131.743413 135.120098 228.483527 214.329761 199.781788 191.818897 9.279454 244.643853 236.573281 478.938331 13.327592 16.876568 16.257685 33.002197 51.673689 48.952495 100.345711 81.985733 80.888984 162.123805 10.492019 9.835723 20.222761 136.744539 131.278507 267.380097 19.221038 18.730197 37.461901 131.928343 126.522191 257.671934 6.016159 19.640040
min -171.715402 14.140873 1.000000 3901.000000 1.000000 3.000000 0.000000 0.000000 3.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 9.000000 0.610000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 14.140873 -171.715402
25% -118.201758 33.753817 11736.500000 34249.000000 1.000000 136.000000 115.000000 5.000000 73.000000 10.000000 45.000000 45.000000 44.000000 45.000000 44.000000 44.000000 36.000000 34.000000 35.000000 44.000000 44.000000 42.000000 40.000000 0.000000 157.000000 148.000000 308.000000 13.490000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 2.000000 1.000000 3.000000 0.000000 0.000000 0.000000 11.000000 11.000000 22.000000 4.000000 3.000000 7.000000 26.000000 24.000000 50.000000 33.753816 -118.201758
50% -93.889011 36.964123 26215.500000 64014.000000 1.000000 261.000000 230.000000 20.000000 158.000000 24.000000 69.000000 68.000000 66.000000 68.000000 67.000000 68.000000 73.000000 95.000000 97.000000 135.000000 132.000000 120.000000 113.000000 2.000000 244.000000 230.000000 474.000000 16.140000 1.000000 0.000000 1.000000 3.000000 3.000000 6.000000 11.000000 10.000000 21.000000 0.000000 0.000000 0.000000 40.000000 38.000000 78.000000 11.000000 10.000000 22.000000 89.000000 82.000000 171.000000 36.964123 -93.889011
75% -84.473956 39.752610 51456.250000 92405.000000 1.000000 427.000000 387.000000 45.000000 285.000000 43.000000 95.000000 92.000000 90.000000 93.000000 92.000000 94.000000 150.000000 228.000000 232.000000 373.000000 361.000000 330.000000 310.000000 8.000000 358.000000 338.000000 694.000000 20.000000 2.000000 2.000000 3.000000 14.000000 13.000000 27.000000 54.000000 52.000000 106.000000 1.000000 1.000000 2.000000 118.000000 114.000000 233.000000 22.000000 21.000000 44.000000 172.000000 160.000000 332.000000 39.752610 -84.473956
max 145.784430 71.298478 100508.000000 99950.000000 8.000000 5770.000000 5563.000000 1400.000000 2921.000000 903.000000 873.000000 646.000000 665.000000 691.000000 669.000000 727.000000 923.000000 844.000000 930.000000 6251.000000 2855.000000 1293.000000 1339.000000 223.000000 4352.000000 4524.000000 8876.000000 1860.000000 585.000000 513.000000 1098.000000 1335.000000 1224.000000 2559.000000 2195.000000 2207.000000 4402.000000 556.000000 440.000000 996.000000 1947.000000 2118.000000 4065.000000 436.000000 422.000000 828.000000 1989.000000 2305.000000 4294.000000 71.298478 145.784430
In [20]:
psChar_23og = psChar_23

Observations of Descriptive Statistics¶

(Min, Max):

  • TotalFreeLunch (3, 5770); FreeLunch (0, 5563); ReducedLunch (0, 1400); MealProgramCertified (3, 2921)
  • PreK (0, 903)
  • Kindergarten (0, 873)
  • Grade1 (0, 646)
  • Grade2 (0, 665)
  • Grade3 (0, 691)
  • Grade4 (0, 669)
  • Grade5 (0, 727)
  • Grade6 (0, 923)
  • Grade7 (0, 844)
  • Grade8 (0, 930)
  • Grade9 (0, 6251)
  • Grade10 (0, 2855)
  • Grade11 (0, 1293)
  • Grade12 (0, 1339)
  • Ungraded (0, 223)
  • Total Male Enrollment (0, 4352); Total Female Enrollment (0, 4524); Total Enrollment (9, 8876)
  • Student Teacher Ratio (0, 1860)
  • American Indian/Alaskan Native Male (0, 585); American Indian/Alaskan Native Female (0, 513); American Indian/Alaskan Native Total (0, 1098)
  • Asian Male (0, 1335); Asian Female (0, 1224); Asian Total (0, 2559)
  • Black Male (0, 2195); Black Female (0, 2207); Black Total (0, 4402)
  • Native Hawaiian/Pacific Islander(HPI) Male (0, 556); Native Hawaiian/Pacific Islander(HPI) Female (0, 440); Native Hawaiian/Pacific Islander(HPI) Total (0, 996)
  • Hispanic Male (0, 1947); Hispanic Female (0, 2118); Hispanic Total (0, 4065)
  • Two or More Races Male (0, 436); Two or More Races Female (0, 422); Two or More Races Total (0, 828)
  • White Male (0, 1989); White Female (0, 2305); White Total (0, 4294)

Mean:

  • TotalFreeLunch: 329.53; FreeLunch: 294.01; ReducedLunch: 35.52; MealProgramCertified: 211.79
  • PreK: 32.78 students
  • Kindergarten: 72.72 students
  • Grade1: 71.25 students
  • Grade2: 69.54 students
  • Grade3: 71.75 students
  • Grade4: 71.14 students
  • Grade5: 72.85 students
  • Grade6: 110.68 students
  • Grade7: 141.24 students
  • Grade8: 144.29 students
  • Grade9: 223.76 students
  • Grade10: 217.42 students
  • Grade11: 200.69 students
  • Grade12: 191.25 students
  • Ungraded: 5.59 students
  • Total Male Enrollment: 298.77 students; Total Female Enrollment: 283.44 students; Total Enrollment: 582.36 students
  • Student Teacher Ratio: 17.14 students/teacher
  • American Indian/Alaskan Native Male: 3.45 students; American Indian/Alaskan Native Female: 3.33 students; American Indian/Alaskan Native Total: 6.78 students
  • Asian Male: 18.92 students; Asian Female 17.76 students; Asian Total: 36.68 students
  • Black Male: 45.42 students; Black Female: 43.89 students; Black Total: 89.30 students
  • Native Hawaiian/Pacific Islander(HPI) Male: 1.80 students; Native Hawaiian/Pacific Islander(HPI) Female: 1.71 students; Native Hawaiian/Pacific Islander(HPI) Total: 3.51 students
  • Hispanic Male: 90.63 students; Hispanic Female: 86.67 students; Hispanic Total: 177.29 students
  • Two or More Races Male: 16.26 students; Two or More Races Female: 15.63 students; Two or More Races Total: 31.82 students
  • White Male: 122.30 students; White Female: 114.48 students; White Total: 236.78 students

Quartile Ranges (25%, 75%):

  • TotalFreeLunch: (136, 427); FreeLunch: (115, 387); ReducedLunch (5, 45); MealProgramCertified: (73, 285)
  • PreK: (10, 43)
  • Kindergarten: (45, 95)
  • Grade1: (45, 92)
  • Grade2: (44, 90)
  • Grade3: (45, 93)
  • Grade4: (44, 92)
  • Grade5: (44, 94)
  • Grade6: (36, 150)
  • Grade7: (34, 228)
  • Grade8: (35, 232)
  • Grade9: (44, 373)
  • Grade10: (44, 361)
  • Grade11: (42, 330)
  • Grade12: (40, 310)
  • Ungraded: (2, 8)
  • Total Male Enrollment: (157, 358); Total Female Enrollment: (148, 338); Total Enrollment: (308, 694)
  • Student Teacher Ratio: (13.49, 20)
  • American Indian/Alaskan Native Male: (0, 2); American Indian/Alaskan Native Female: (0, 2); American Indian/Alaskan Native Total: (0, 3)
  • Asian Male: (0, 14); Asian Female: (0, 13); Asian Total: (1, 27)
  • Black Male: (1, 54); Black Female: (1, 52); Black Total: (3, 106)
  • Native Hawaiian/Pacific Islander(HPI) Male: (0, 1); Native Hawaiian/Pacific Islander(HPI) Female: (0, 1); Native Hawaiian/Pacific Islander(HPI) Total: (0, 2)
  • Hispanic Male: (11, 118); Hispanic Female: (11, 114); Hispanic Total: (22, 233)
  • Two or More Races Male: (4, 22); Two or More Races Female: (3, 21); Two or More Races Total: (7, 44)
  • White Male: (26, 172); White Female: (24, 160); White Total: (50, 332)

Standard Deviation:

Higher than mean- Reduced Lunch, PreK, Grade9, Grade12, Ungraded; all student races Lower- Total Free Lunch, Free Lunch, Meal Program Certified, all grades (except 9 and 12), Total Male Enrollment, Total Female Enrollment, Total Enrollment, Student to Teacher Ratio

FRPL rates- the std's are moderately lower than the means, excluding the std for ReducedLunch which is higher than the mean.

The standard deviations for PreK, and Grades 9 and 12, are higher than the means, while all other grades are lower.

The standard deviations for enrollment rates are lower than the means.

The standard deviation for the student to teacher ratio is lower than the mean.

The standard deviations for all student demographics are higher than the means, though the disparity found in White student demographics is much less significant compared to other races/ethnicities.

Mean/Median Closeness:

The medians for the free/reduced lunch status of the schools are lower than the means.

For the columns covering the elementary school grades, the medians are close but lower than the mean values. For the other grades, the medians are not as close, but are still lower than the means.

The median for the student-teacher ratio is close to the mean.

The median values for the Black and Hispanic student demographics are significantly lower than the mean values.

In [21]:
print(psChar_23["StudentTeacherRatio"].describe())
count    37392.000000
mean        17.143016
std         13.327592
min          0.610000
25%         13.490000
50%         16.140000
75%         20.000000
max       1860.000000
Name: StudentTeacherRatio, dtype: float64
In [22]:
psChar_23.columns
Out[22]:
Index(['X', 'Y', 'ObjectID', 'StateABR', 'SchoolName', 'Street1', 'Street2',
       'City', 'State', 'Zip', 'Zip4', 'Charter', 'Virtual', 'LowestGrade',
       'HighestGrade', 'SchoolLevel', 'Status', 'SchoolType', 'Status_Text',
       'Locale', 'County', 'TotalFreeLunch', 'FreeLunch', 'ReducedLunch',
       'MealProgramCertified', 'PreK', 'Kindergarten', 'Grade1', 'Grade2',
       'Grade3', 'Grade4', 'Grade5', 'Grade6', 'Grade7', 'Grade8', 'Grade9',
       'Grade10', 'Grade11', 'Grade12', 'Ungraded', 'TotMaleEnrollment',
       'TotFemaleEnrollment', 'TotalEnrollment', 'StudentTeacherRatio',
       'AIANMale', 'AIANFem', 'AIANTotal', 'AsianMale', 'AsianFemale',
       'AsianTotal', 'BlackMale', 'BlackFemale', 'BlackTotal', 'HPIMale',
       'HPIFemale', 'HPITotal', 'HispanicMale', 'HispanicFemale',
       'HispanicTotal', 'TRMale', 'TRFemale', 'TRTotal', 'WhiteMale',
       'WhiteFemale', 'WhiteTotal', 'Latitude', 'Longitude'],
      dtype='object')
In [23]:
print(type(psChar_23))
<class 'pandas.core.frame.DataFrame'>

Scatter Plots¶

In [24]:
import matplotlib.pyplot as plt
import numpy as np

psChar_23 = psChar_23[psChar_23["TotalFreeLunch"] <= psChar_23["TotalEnrollment"]]
psChar_23["LunchRate"] = (psChar_23["TotalFreeLunch"] / psChar_23["TotalEnrollment"]) * 100

race_colors = {"BlackTotal": "tab:blue", "HispanicTotal": "tab:orange", "WhiteTotal": "tab:green"}
size_values = {"BlackTotal": 45, "HispanicTotal": 45, "WhiteTotal": 45} 
psChar_23["PredominantRace"] = psChar_23[["BlackTotal", "HispanicTotal", "WhiteTotal"]].idxmax(axis=1)

fig, ax = plt.subplots()
for race, color in race_colors.items():
    subset = psChar_23[psChar_23["PredominantRace"] == race]  
    x = subset["LunchRate"]
    y = subset["StudentTeacherRatio"]
    scale = 200.0 * np.random.rand(len(subset))
    ax.scatter(x, y, c=color, s=size_values[race], label=race, alpha=0.3, edgecolors='none')

ax.legend(('Predominately Black School', 'Predominately Latino/Hispanic School', 'Predominately White School'), loc='upper right', shadow=True)
ax.grid(True)
ax.set_xlabel("% of Students w/ FRPL Eligibility")
ax.set_ylabel("Students per Teacher")
ax.set_title("FRPL Eligibility & Student-Teacher Ratio (by Race)")

ax.set_xlim(0, 100)  
ax.set_ylim(0, 50)  

ax.set_ymargin(0.1)   
ax.set_xmargin(0.1)

plt.show()
No description has been provided for this image

Hispanic/Latino Students¶

In [25]:
race_colors = {"BlackTotal": "tab:blue", "HispanicTotal": "tab:orange", "WhiteTotal": "tab:green"}
size_values = {"BlackTotal": 0, "HispanicTotal": 45, "WhiteTotal": 0} 
psChar_23["PredominantRace"] = psChar_23[["BlackTotal", "HispanicTotal", "WhiteTotal"]].idxmax(axis=1)

fig, ax = plt.subplots()
for race, color in race_colors.items():
    subset = psChar_23[psChar_23["PredominantRace"] == race]  
    x = subset["LunchRate"]
    y = subset["StudentTeacherRatio"]
    scale = 200.0 * np.random.rand(len(subset))
    ax.scatter(x, y, c=color, s=size_values[race], label=race, alpha=0.3, edgecolors='none')


ax.grid(True)
ax.set_xlabel("% of Students w/ FRPL Eligibility")
ax.set_ylabel("Students per Teacher")
ax.set_title("FRPL Eligibility & Student-Teacher Ratio (Hispanic Students)")

ax.set_xlim(0, 100)  
ax.set_ylim(0, 50)  

ax.set_ymargin(0.1)   
ax.set_xmargin(0.1)

plt.show()
No description has been provided for this image

Black Students¶

In [26]:
race_colors = {"BlackTotal": "tab:blue", "HispanicTotal": "tab:orange", "WhiteTotal": "tab:green"}
size_values = {"BlackTotal": 45, "HispanicTotal": 0, "WhiteTotal": 0} 
psChar_23["PredominantRace"] = psChar_23[["BlackTotal", "HispanicTotal", "WhiteTotal"]].idxmax(axis=1)

fig, ax = plt.subplots()
for race, color in race_colors.items():
    subset = psChar_23[psChar_23["PredominantRace"] == race]  
    x = subset["LunchRate"]
    y = subset["StudentTeacherRatio"]
    scale = 200.0 * np.random.rand(len(subset))
    ax.scatter(x, y, c=color, s=size_values[race], label=race, alpha=0.3, edgecolors='none')


ax.grid(True)
ax.set_xlabel("% of Students w/ FRPL Eligibility")
ax.set_ylabel("Students per Teacher")
ax.set_title("FRPL Eligibility & Student-Teacher Ratio (Black Students)")

ax.set_xlim(0, 100)  
ax.set_ylim(0, 50)   

ax.set_ymargin(0.1)   
ax.set_xmargin(0.1)

plt.show()
No description has been provided for this image

White Students¶

In [27]:
race_colors = {"BlackTotal": "tab:blue", "HispanicTotal": "tab:orange", "WhiteTotal": "tab:green"}
size_values = {"BlackTotal": 0, "HispanicTotal": 0, "WhiteTotal": 45} 
psChar_23["PredominantRace"] = psChar_23[["BlackTotal", "HispanicTotal", "WhiteTotal"]].idxmax(axis=1)

fig, ax = plt.subplots()
for race, color in race_colors.items():
    subset = psChar_23[psChar_23["PredominantRace"] == race]  
    x = subset["LunchRate"]
    y = subset["StudentTeacherRatio"]
    scale = 200.0 * np.random.rand(len(subset))
    ax.scatter(x, y, c=color, s=size_values[race], label=race, alpha=0.3, edgecolors='none')


ax.grid(True)
ax.set_xlabel("% of Students w/ FRPL Eligibility")
ax.set_ylabel("Students per Teacher")
ax.set_title("FRPL Eligibility & Student-Teacher Ratio (White Students)")

ax.set_xlim(0, 100)  
ax.set_ylim(0, 50)  

ax.set_ymargin(0.1)   
ax.set_xmargin(0.1)

plt.show()
No description has been provided for this image

Bubble Plot¶

In [29]:
import plotly.graph_objects as go
import plotly.express as px
import pandas as pd
import math

sample_size = 1000
sample = psChar_23.sample(n=sample_size, random_state=1)

hover_text = []
bubble_size = []

for index, row in sample.iterrows():
    hover_text.append(('School: {SchoolName}<br>'+
                      'Lunch Rate: {LunchRate:.2f}<br>'+
                      'Students per Teacher: {StudentTeacherRatio}<br>'+
                      'Total Enrollment: {TotalEnrollment}<br>').format(SchoolName=row['SchoolName'],
                                            LunchRate=row['LunchRate'],
                                            StudentTeacherRatio=row['StudentTeacherRatio'],
                                            TotalEnrollment=row['TotalEnrollment']))
    bubble_size.append(math.sqrt(row['TotalEnrollment']))

sample['text'] = hover_text
sample['size'] = bubble_size
sizeref = 2.*max(sample['size'])/(25**2)

race_categories = ['BlackTotal', 'HispanicTotal', 'WhiteTotal']
race_data = {race: sample[sample["PredominantRace"] == race] for race in race_categories}

fig = go.Figure()

for race, subset in race_data.items():
    fig.add_trace(go.Scatter(
        x=subset["LunchRate"],
        y=subset["StudentTeacherRatio"],
        name=race,
        text=subset["text"],
        marker_size=subset['size'],
        ))

fig.update_traces(mode='markers', marker=dict(sizemode='area',
                                              sizeref=sizeref, line_width=2))

fig.update_layout(
    title="FRPL Eligibility & Student-Teacher Ratio",
    xaxis=dict(title="% of Students w/ FRPL Eligibility", gridcolor='white', gridwidth=2),
    yaxis=dict(title="Students per Teacher", gridcolor='white', gridwidth=2, range=[0, 50],  
        dtick=20),
    paper_bgcolor='rgb(243, 243, 243)',
    plot_bgcolor='rgb(243, 243, 243)',
)

fig.show()

Executive Summary¶

How Funding Disparities in U.S Public Schools Suggest Black/Hispanic Underrepresentation in High-Cost Sports¶

Is there a connection between income and sports partcipation opportunities?¶

Introduction¶

Each year, the Aspen Institute releases the State of Play report, an analysis capturing sports participation data on youth in the United States. In the most recent report released in 2024, key findings suggest a connection between income and sports participation.

According to the National Survey of Children's Health (NSCH), Vermont, Iowa, North Dakota, Wyoming, Maine, South Dakota, and New Hampshire reported the highest percentage of youth sports participation (over 63%). Alternatively, states such as New Mexico, Nevada, Mississippi, and Louisiana reported below average youth sports participation (under 54%). Excluding Nevada, the remaining states are among the poorest states in the country, and have larger minority populations -- while many of the states with the highest percentages have low minority populations.

For instance, through data sourced from the Sports & Fitness Industry Association (SFIA) and their Sports Marketing Surveys (SMS), it was found that the sports particpation rate in Black youth aged 6-17 declined from 45% to 35% over a ten-year period (2013-2023).

It is important to note that sports participation rates in general have been steadily returning to how they were pre-COVID; however, the trends in youth sports participation post-COVID are shown differently among various demographics. For instance, youth aged 6-12 coming from lower-income families (under $25,000) were the only demographic that declined in sports participation rates from 2022 to 2023, while every other income bracket increased.

Purpose¶

Although more youth are returning to sports post-COVID, opportunities for sports participation seems to be heavily dependent upon various factors such as socioeconomic status, accesibility, and resource allocation. In public schools, disparities in funding can limit access to appropriate facilities, personnel, or physical education, which could hinder opportunities to participate in sports or physical activity. Outside of school, high-costs of specialized or club teams, or lack of accessibility to recreational facilities can also be barriers, especially for lower income families.

Facility Quality¶

In a 2021 report written by the 21st Century School Fund, a non-profit dedicated towards improving public school facilities, numerous inequities were found in public school funding due to racial, socioeconomic, and geographic factors. For example, rural school districts with lower-income public schools received about 2.3 million dollars per school to make capital improvements on facilities and buildings; however the average is 4.3 million dollars per school, meaning that those rural low-income schools were only receiving half as much as the national average.

Physical Education¶

A study done by researchers for the Bridging the Gap program found that only 43% of students at Black public elementary schools received the recommended 20+ minutes of recess time, and 55% of students at predominately Hispanic/Latino schools. These percentages, when compared to the 77% of students at predominately White schools, suggest a correlation between race and disparities in physical activity offered to students through recess time. In another study conducted to examine physical activity among middle school students, researchers found a significant relationship between schools with a higher amount of students using the free/reduced lunch program and having less environmental access for physical activity.

Specialized Sports¶

Many parents sign their children up for specialized sports -- year-round training and competition in the form of AAU or club teams -- in order to increase scouting opportunities and assist with skill/performance development. Sports specialization typically requires costly investments towards participation, travel, and equipment fees, which can create financial barriers, especially in high-cost sports such as tennis, gymnastics, and ice hockey.

Methodology¶

Public School Characteristics 2022-23

Last Updated: October 21, 2024

https://catalog.data.gov/dataset/public-school-characteristics-2022-23-451db

The National Center for Education Statistics (NCES) gathers demographic and geographic data about U.S public schools and factors such as enrollment and Title I status. The variables that will be analyzed are those regarding free/reduced-price lunch (FRPL) rates, class size, and student demographics.

The uncleaned dataset had 101,390 rows of data, with 77 columns. The first step was to address any missing values in the dataset, specifically in the FRPL columns. As indicated in the description of this dataset online, these missing values are represented by a number of indicators: -1 indicates that data is missing, -2 or N indicates that data is not applicable, and -9 indicates that data did not meet NCES data quality standards. Because these values were negative, rows with these negative values were dropped. The next step was to make sure all of the schools being analyzed were 1) operational, so schools that were reported as "School temporarily closed" or "School to be operational within two years" in the 'Status' column of the dataset were removed. Similarly, this analysis is focused on traditional schools, so schools in the "SchoolType" column reported as alternative, special education, or career and technical schools were also removed. Additionally, columns unecessary to the analysis such as administrative information -- StaffFTE, Phone, LEA Name, LEA ID, etc. were removed as well. Lastly, since this analysis is focused on K-12 students, the columns 'Adult Ed' and 'Grade 13' were removed.

Descriptive statistics were then used to find key information about the variables being analyzed such as minimum and maximum values, mean and median, quartile ranges, and standard deviation.

For visualizations, scatter plots and a bubble plot were chosen. These were chosen since because the main variables for analysis were both numerical and could be used to visualize any clear correlations between the two. Additionally, these visualizations were also color coded based on the predominant race (either Black, Hispanic/Latino, or White) of each school. Four scatterplots were generated; one with all of the races, and one for each of the races being analyzed. One bubble plot was generated using a sample size of 1000 schools which consisted of all analyzed races in one plot.

Analysis¶

According to the National Center for Education Statistics (NCES), the following percentages are used to determine low-poverty and high-poverty schools:

  • Low-poverty: 25% or less of students are FRPL eligible
  • Mid-low poverty: 25.1% to 50% of students are FRPL eligible
  • Mid-high povety: 50.1% to 75% of students are FRPL eligible
  • High-Poverty: 75% or more of students are FRPL eligible

For the analysis, FRPL values were used as an indicator of income, while student-to-teacher ratio was used as an indicator of resource allocation. Further research might find that school locale (urban, suburban, town, rural) could also be factors that correlate with these variables, but they were not considered for this analysis.

FRPL¶

The first observations are the mean values for students receiving free lunch, and those receiving reduced lunch. The mean value of students receiving free lunch is around 294.01 students, while reduced is around 35.52 students. Because significantly more students qualifying for FRPL are in need of the full free lunch on average, this could be a possible indicator of a lower-income school. The standard deviation (std) for the free lunch column is lower than the mean, while the std for the reduced price lunch column is higher, suggesting more variability in the number of students receiving reduced lunch. This is confirmed when looking at the quartile ranges; from 25% to 75% (5-45 students), the values suggest that while not many students recieve reduced-price lunch per school, some schools have up to 1400 students receiving it, which is higher than average. Through these statistics, it can be inferred that there are a small amount of schools that have a disproportionate amount of students receiving FRPL, and they might be overlooked if just looking at the mean itself.

Enrollment Trends¶

The mean values for enrollment show a clear drop in enrollment from grade 9 to 12, starting at a mean of around 223.76 students (grade 9) per school to a mean of around 191.25 students (grade 12) per school. Grade 9 has a larger std than the mean, with a max value of 6251 students. Grade 12 also has a larger std than the mean, but there is only a .6 difference.

Class Size¶

By looking at the descriptive statistics for the student-to-teacher ratio, the values seem consistent, as the std at around 13.33 students/teacher is lower than the mean of around 17.14 students/teacher. The 25%-75% ranges are also consistent (13.49-20.00); however, the max value is 1860 students/teacher, indicating either an errror in the data that was entered or an atypical class size.

Race/Ethnicity of Students¶

The mean for total black students is around 89.30 students per school, yet the quartile ranges from 3-106 students; however, the max value goes up to 4402. For Hispanic/Latino students, the mean is around 177.29 students, with quartile ranges from 22-233; the max value goes up to 4065. The mean for white students is around 236.78 students, with quartile ranges from 50 to 332 students and a max value of 4294. These statistics suggest that white students are more evenly dispersed in schools, while Black and Hispanic students vary -- most schools have little to no Black or Hispanic students, while very few have high concentrations of them.

Visualization Findings¶

Scatter Plots¶

In the scatter plot displaying all races in this analysis (Black, Hispanic, white), the first item is that the number of points representing predominately white schools is signifcantly larger than the predominately Black or Hispanic schools.

For predominately white schools, there is a larger concentration of points to the left (lower FRPL percentage). As the FRPL percentage increases, the number of students per teacher slightly decreases, but the plot itself does not show any extreme trends.

For predominately Black schools, there are significantly less points than the predominately Hispanic or white schools. On this plot, there is a much larger concentration of schools with a higher FRPL percentage, with most falling into 60-100%, and a very small amount of these schools that only have 0-40% of students eligible for FRPL.

For predominately Hispanic schools, there is also a larger concentration of schools having a larger FRPL percentage, though the points are more dispersed across the plot than those in the predominately Black schools plot.

Bubble Plot¶

The bubble plot uses a sample size of 1000 schools and displays color-coded points to show if a school is predominately white, Black or Hispanic. This plot shows similar findings to the scatter plots; a higher concentration towards lower lunch rate percentages for predominately white schools, a dispersement of points (though more concentrated as the lunch rate increases) across the plot for predominately Hispanic schools, and a lack of predominately Black schools with points that trend towards higher lunch rate percentages.

Recommendations¶

Based on this analysis, key findings indicate inequities within the U.S public school system that could be improved through policy and intervention. The first is to make an effort to intentionally identify schools with a disproportinate amount of students receiving FRPL, rather than allow them to be overlooked by only considering mean values, especially when the data includes significant outliers. The second issue is the uneven dispersement of Black and Hispanic/Latino students in schools that highlights racial and socioeconomic issues. This could be addressed by identifying any barriers predominately Black or Hispanic schools might face, and by prioritizing resource allocation to schools with a large concentration of Black and Hispanic students in high poverty areas.

Conclusion¶

Though this research sought to explore how school funding and resource allocation could affect youth sports particpation, this analysis also highlighted broad racial and socioeconomic disparities in the U.S public school system. By using childrens' accessibility and opportunity for sports particpation as an example, the findings from this analysis point to a broader issue -- systemic inequities that can affect students' experiences and opportunities. To address these inequities, policymakers should be intentional and take notice of marginalized and underresourced communities to ensure all students and school districts from receive equitable opportunity and treatment.

References¶

Aspen Institute. (2023). State of Play 2023. Project Play. https://projectplay.org/state-of-play-2023/participation

Carlson, J. A., Mignano, A. M., Norman, G. J., McKenzie, T. L., Kerr, J., Arredondo, E. M., Madanat, H., Cain, K. L., Elder, J. P., Saelens, B. E., & Sallis, J. F. (2014). Socioeconomic disparities in elementary school practices and children's physical activity during school. American journal of health promotion : AJHP, 28(3 Suppl), S47–S53. https://doi.org/10.4278/ajhp.130430-QUAN-206

Young, D. R., Felton, G. M., Grieser, M., Elder, J. P., Johnson, C., Lee, J. S., & Kubik, M. Y. (2007). Policies and opportunities for physical activity in middle school environments. The Journal of school health, 77(1), 41–47. https://doi.org/10.1111/j.1746-1561.2007.00161.x